809 research outputs found
Generation of Policy-Level Explanations for Reinforcement Learning
Though reinforcement learning has greatly benefited from the incorporation of
neural networks, the inability to verify the correctness of such systems limits
their use. Current work in explainable deep learning focuses on explaining only
a single decision in terms of input features, making it unsuitable for
explaining a sequence of decisions. To address this need, we introduce
Abstracted Policy Graphs, which are Markov chains of abstract states. This
representation concisely summarizes a policy so that individual decisions can
be explained in the context of expected future transitions. Additionally, we
propose a method to generate these Abstracted Policy Graphs for deterministic
policies given a learned value function and a set of observed transitions,
potentially off-policy transitions used during training. Since no restrictions
are placed on how the value function is generated, our method is compatible
with many existing reinforcement learning methods. We prove that the worst-case
time complexity of our method is quadratic in the number of features and linear
in the number of provided transitions, . By applying
our method to a family of domains, we show that our method scales well in
practice and produces Abstracted Policy Graphs which reliably capture
relationships within these domains.Comment: Accepted to Proceedings of the Thirty-Third AAAI Conference on
Artificial Intelligence (2019
Language-based sensing descriptors for robot object grounding
In this work, we consider an autonomous robot that is required
to understand commands given by a human through natural language.
Specifically, we assume that this robot is provided with an internal
representation of the environment. However, such a representation is unknown
to the user. In this context, we address the problem of allowing a
human to understand the robot internal representation through dialog.
To this end, we introduce the concept of sensing descriptors. Such representations
are used by the robot to recognize unknown object properties
in the given commands and warn the user about them. Additionally, we
show how these properties can be learned over time by leveraging past
interactions in order to enhance the grounding capabilities of the robot
Graph-based task libraries for robots: generalization and autocompletion
In this paper, we consider an autonomous robot that persists
over time performing tasks and the problem of providing one additional
task to the robot's task library. We present an approach to generalize
tasks, represented as parameterized graphs with sequences, conditionals,
and looping constructs of sensing and actuation primitives. Our approach
performs graph-structure task generalization, while maintaining task ex-
ecutability and parameter value distributions. We present an algorithm
that, given the initial steps of a new task, proposes an autocompletion
based on a recognized past similar task. Our generalization and auto-
completion contributions are eective on dierent real robots. We show
concrete examples of the robot primitives and task graphs, as well as
results, with Baxter. In experiments with multiple tasks, we show a sig-
nicant reduction in the number of new task steps to be provided
Automated Formula Generation and Performance Learning for the FFT
A single signal processing algorithm can be represented by many different but mathematically equivalent formulas. When these formulas are implemented in actual code, they often have very different running times. Thus, an important problem is finding a formula that implements the signal processing algorithm as efficiently as possible. In this paper we present three major results toward this goal: (1) Different but mathematically equivalent formulas can be generated automatically in a principled way, (2) Simple features describing formulas can be used to distinguish formulas with significantly different running times, and (3) A function approximator can learn to accurately predict the running time of a formula given a limited set of training data
- …